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Abstract 

Model-based compressed sensing refers to compressed sensing with extra structure about the underlying 
sparse signal known a priori. Recent work has demonstrated that both for deterministic and probabilistic 
models imposed on the signal, this extra information can be successfully exploited to enhance recovery 
performance. In particular, weighted t\ -minimization with suitable choice of weights has been shown to 
improve performance at least for a simple class of probabilistic models. In this paper, we consider a more 
general and natural class of probabilistic models where the underlying probabilities associated with the 
indices of the sparse signal have a continuously varying nature. We prove that when the measurements 
are obtained using a matrix with i.i.d Gaussian entries, weighted i\ -minimization with weights that have 
a similar continuously varying nature successfully recovers the sparse signal from its measurements with 
overwhelming probability. It is known that standard i'l-minimization with uniform weights can recover 
sparse signals up to a known sparsity level, or expected sparsity level in case of a probabilistic signal 
model. With suitable choice of weights which are chosen based on our signal model, we show that weighted 
i\ -minimization can recover signals beyond the sparsity level achievable by standard ^i-minimization. 

I. Introduction 

Compressed Sensing has emerged as a modern alternative to traditional undersampling for compressible 
signals. Previously, the most common way to view recovery of signals from samples was based on the 
Nyquist criterion. According to the Nyquist criterion, a band-limited signal has to be sampled at a rate 
at least twice its bandwidth to allow exact recovery. In this case the band-limitedness of the signal is 
the extra known information that allows us to reconstruct the signal from compressed measurements. In 
compressed sensing the additional structure considered is that the signal is sparse with respect to a certain 
known basis. As opposed to sampling at Nyquist rate and subsequently compressing, we now obtain linear 
measurements of the signal and the compression and measurement steps are now combined by obtaining 
much smaller number of linear measurements than what would be in general required to reconstruct the 
signal from its measurements. 

After fixing the basis with respect to which the signal is sparse, the process of obtaining the measure- 
ments can be written as y = Ax, where, y € R m is the vector of measurements, x 6 H™ is the signal and 
A € R mxn represents the m linear functionals acting on the signal x. We call A the measurement matrix. 
The signal x is considered to have at most k non-zero components and we are typically interested in the 
scenario where k is much smaller than n. Compressed Sensing revolves around the fact that for sparse 
signals, the number of such linear measurements needed to reconstruct the signal can be significantly 
smaller than the ambient dimension of the signal itself. The reconstruction problem can be formulated as 
finding the sparsest solution x satisfying the constraints imposed by the linear measurements y. This can 
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be represented by 

min ||x|| 
subject to y = Ax. 

This problem is inherently combinatorial in nature and is in general a NP-hard problem. This is because 
for a certain value of the size of the support of x given by k, one needs to search through all (?) possible 
supports of the signal. Seminal work by Candes and Tao in [1] and Donoho in [2] show that under certain 
conditions on the measurement matrix A, i\ norm minimization, which can be recast as a linear program, 
can recover the signal from its measurements. Additionally, a random matrix with i.i.d. Gaussian entries 
with mean zero satisfies the required condition with overwhelming probability. Linear programming is 
known to have polynomial time complexity and the above mentioned result tells us that for a large class of 
measurement matrices A we can solve an otherwise NP-hard combinatorial problem in polynomial time. 
Subsequently, iterative methods based on a greedy approach were formulated which recover sparse signals 
from their measurements by obtaining an increasingly accurate approximation to the actual signal in each 
iteration. Examples of these include CoSaMP [3 1 and IHT |4|. The compressed sensing framework has also 
been generalized to the problem of recovering low rank matrices from compressed linear measurements 
represented by linear matrix equations ||5] |]6]. 

Most of the earlier literature on Compressed Sensing focused on the case where the only constraints on 
the signal x are those imposed by its measurements y. On the other hand it is natural to consider the case 
where besides sparsity, there is certain additional information on the structure of the underlying signal. 
This is true in several applications, examples of which include encoding natural images (JPEG), MRI and 
DNA microarrays J7J, fl8]. This leads us to Model Based Compressed Sensing where the aim is to devise 
recovery methods specific to the signal model at hand. Furthermore, one would also want to quantify the 
possible benefits it has over the standard method (e.g. lesser number of required measurements for the 
same level of sparsity of the signal). This has been explored in some recent papers. The authors in |]9] 
analyzed a deterministic signal model, were the support of the underlying signal is constrained to belong 
to a given known set. This defines a subset M. of the set of all fc-sparse signals, which is now the set of 
allowable signals. This results in an additional constraint on the original reconstruction problem. 

min 1 1 x 1 1 o 
subject to y = Ax, 
xeM. 

It was shown that an intuitive modification to the CoSaMP or IHT method succeeds in suitably exploiting 
the information about the model. The key property defined in JT], known as the Restricted Isometry 
Property was adapted in J5] to a model based setting. With this, it was shown that results similar to 0] 
can be obtained for model-based signal recovery. 

As opposed to this, a probabilistic model, i.e. a Bayesian setting, was considered in [10|. Under this 
model there are certain known probabilities associated with the components of the signal x. Specifically, 
Pi, i = 1,2, ... ,n with < Pi < 1 are such that 

P(xi is non-zero) = Pi i = 1, 2, . . . , n. 

The deterministic version of the same model called the "nonuniform sparse model" was considered in 
ifTTI . For this, the use of weighted l\ -minimization was suggested, given by 

min ||x|| Wi i 
subject to y = Ax, 
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where | |xj = Yli=i w i\ x i\ denotes the weighted l\ norm of x. The quantities Wi,i = 1,2, ... ,n are 
some positive scalars. Similar to J2), lfl2l . ideas based on high dimensional polytope geometry were used 
to provide sufficient conditions under which weighted l\ -minimization recovers the sparse signal. This 
method was introduced earlier in [13] where it was used to analyze the robustness of i\ -minimization 
in recovering sparse signals from noisy measurements. The specific model considered in iflOl can be 
described as follows. Consider a partition of the indices 1 to n into two disjoint sets T\ and T^. Let 
Pi = P±,i E T\ and pi = P^i € T^. As a natural choice choose the weights in the weighted i\- 
minimization as it;, = W%,i G T\ and Wi = Wi,i € T2. The main result of ifTUl is that under certain 
conditions on ^ and A, weighted l\ -minimization can recover a signal drawn from the above model 
with overwhelming probability. 

In this paper, we extend this approach to a more general class of signal models. In our model, the 
probabilities pi as described above are assumed to vary in a "continuous" manner across the indices. 
The specific way in which the probabilities vary is governed by a known shape function p(.) which is 
continuous, non-negative and non-increasing. For a signal drawn from this model, we propose the use of 
weighted t\ -minimization to reconstruct it from its compressed linear measurements. In addition, to match 
the signal model, we propose that the weights be chosen according to a continuous, non-negative and non- 
decreasing shape function /(.). We prove that under certain conditions on p(.), /(.) and the measurement 
matrix A, we can reconstruct the signal perfectly with overwhelming probability. We also suggest a way 
to select the shape function /(.) for the weights based on Our results are also applicable when the 
functions p(.) and /(.) are piece-wise continuous and the results in [10] can be obtained from ours as a 
special case. 

The rest of the paper is organized as follows. In Section|II] we introduce the basic notation. We formulate 
the exact questions that we set out to answer in this paper and state our main theorem. In Section UIT1 we 
focus on how weighted ^-minimization behaves by restricting our attention to special class of signals 
that are particularly suitable for the ease of analysis. We later show that the methods generalize to other 
simple classes and is all we need to establish the main result of this paper. In Section [IV] we describe the 
key features of a typical signal drawn from our model. We also prove a suitable large deviation result for 
the probability that a signal drawn from our model lies outside this typical set. In Section [V] we provide 
numerical computations to demonstrate the results we derive. We also provide simulations to reinforce 
our results. We then conclude the paper with possibly interesting questions that may be explored in the 
future. 

II. Problem Formulation 

A. Notation and parameters 

We denote scalars by lower case letters (e.g. c), vectors by bold lower case letters (e.g. x), matrices 
with bold upper case letters (e.g. A). Probability of an event E is denoted by P(E). The i th standard unit 
vector in R™ is denoted by e, = (0, 0, . . . , 1, . . . , 0) T , where the "1" is located in the i th position. 

The underlying sparse signal is represented by x € R™, the measurement matrix by A e R mxn . 
The vector of observations is denoted by y and is obtained through linear measurements of x given by 
y = Ax. Typically we would need n linear measurements to be able to recover the signal. The scalar 
a = — determines how many measurements we have as a fraction of n. We call this the compression 
ratio. 

B. Model of the Sparse Signal 

Let p : [0, 1] — !• [0, 1] be a continuous monotonically non-increasing function. We call p(.) the probability 
shape function and the reason for this name will become clear from the description below. The support of 
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the signal x is decided by the outcome of n independent Bernoulli random variables. In particular, if Ei 
denotes the event that i € Supp(x), then the events Ei i = 1, . . . , n are independent and P(-Ej) = p (~) . 
Although we assume throughout the paper that p(.) is continuous, our result generalizes to piecewise 
continuous functions as well. Let us denote by k the cardinality of Supp(x). Note that under the above 
model A: is a random variable that can take any value from to n. The expected value of k is given by 
Y^t—iP (")• We denote by 6 the expected fractional sparsity given by 6 — — E[fe] = — J27=iP (")• 

As is standard in Compressed Sensing literature we assume that the entries of the measurement matrix 
A are i.i.d. Gaussian random variables with mean zero and variance one. The measurements are obtained 
as y = Ax. 

C. Weighted ii-minimization 

When the fractional sparsity S is much smaller than one, a signal sampled from the model described 
above is sparse. Hence, it is possible to recover the signal from its measurements y by ^-minimization 
which is formulated as 

min ||x||i 
subject to Ax = y. 

However this does not exploit the extra information available from the knowledge of the priors. Instead, we 
use weighted l\ -minimization to recover the sparse signal which is captured by the following optimization 
problem: 

min ||x|| w ,i (1) 
subject to Ax = y. 

where w 6 R™ is a vector of positive weights and ||s|| w ,i = Y^i=i w i\ x i\ refers to the weighted l\ norm 
of x, for a given weight vector w. The weight vector w plays a central role in determining whether ((T) 
successfully recovers the sparse signal x. Intuitively, w should be chosen in a certain way depending on 
p(.) so as to obtain the best performance (although at this point we have not precisely defined the meaning 
of this). Keeping in mind the structure of pi, i = 1, . . . , n, we suggest using weights Wi, i = 1, . . . , n 
which have a similar structure. Formally, let / : [0, 1] — > M + be a non-negative non-decreasing continuous 
function. Then we choose the weights as u>i = f (jA. We call /(.) the weights shape function. 

D. Problem statement 

In this paper we try to answer the following two questions: 

• Given the problem parameters (size of the matrix defined by m, n), the functions p(.) and /(.), does 
weighted l\ -minimization in (Q~|i recover the underlying sparse signal x with high probability? 

• Given a probability shape function p(.), and a family of weight shape functions W, how to choose 
a function /(.) e W that has the best performance guarantees? 

We give an answer in the affirmative to the first question, given that the functions p(.) and /(.) satisfy 
certain specified conditions. This is contained in the main result of this paper which is 

Theorem 1. Let the probability shape function p(.) and the weight shape function /(.) be given. Let E 
be the event that weighted l\-minimization described in (fJJ fails to recover the correct sparse vector x. 
There exists a quantity iptotip, f) which can be computed explicitly as described in Section [B-4\ such that 
whenever tptot(p> /) < me probability of failure P(E) of weighted t\-minimization decays exponentially 
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with respect to n. More precisely, if for some e > we have iptot(p, f) < — e. then there exists a constant 
c(e) > such that for large enough n, the probability of failure satisfies P(E) < e^ nc ^\ 

To answer the second question, we first define the measure of performance. For a given value of a = ^, 
let 5 be the maximum value of 6 for which weighted ^-minimization can be guaranteed to succeed with 
high probability. For a given family of weight functions W, the weight function /(.) that has the highest 6 
is said to have the best performance. In this paper we will describe a method to choose a weight function 
in W and demonstrate the method in Section [V] for a linearly parameterized family. 

III. Analysis of Weighted l x -minimization 

In this section, we analyze the performance of weighted l\ -minimization with weights specified by /(.) 
on a certain special class of sparse signals x. As we will see in Section IIV1 we can easily generalize the 
analysis to the signals drawn from our signal model. 

Recall that the failure event E from Theorem Q] is defined as the event that weighted ^-minimization 
fails to recover the correct sparse vector x. We call the probability of this event P(E) as the probability of 
failure. Without loss of generality we can assume that ||x|| w .i = 1, that is x lies on a k — 1-dimensional 
face of the weighted cross-polytope 

V = {x S JR" s.t. ||x|| w ,i < 1} 

which is the weighted ^i-ball in n dimensions. The specific face of V on which x lies is determined by 
the support of x. The probability of failure P(E) can be written as 

P(E) = ^P(xe F)P{E\x e F), 

where T is the set of all faces of the polytope V. Then, as shown in iflOl . the event {-E|x G F} is 
precisely the event that there exists a non-zero u 6 null(A) such that ||x + uj | w ,i < H x l|w.i conditioned 
on {x 6 F}. Also since A has i.i.d. Gaussian entries, sampling from the null space of A is equivalent to 
sampling a subspace uniformly from the Grassmann manifold Gr(„_ m )(n). So conditioned on {x e F} 
the event {-E|x 6 F} is same as the event that a uniformly chosen {n— m)-dimensional subspace shifted 
to x intersects the polytope V at a point other than x. The probability of the above event is also called 
the complementary Grassmann Angle for the face F with respect to the polytope V under the Grassmann 
manifold Gr(„_ m ) (n). Based on work by Santalo [ 14 1 and McMullen lfl5l the Complementary Grassmann 
Angle can be expressed explicitly as the sum of products of internal and external angles. 

P(£|xeF) = 2^ Yl (3(F,Gh(G,V) (2) 

s>0 Ge,7(m+l+2s) 

where /3(F,G) and r y(G,'P) are the internal and external angles and J(r) is the set of all r-dimensional 
faces of V. The definitions of internal and external angles can be found in [10]). We include them here 
for completeness. 

• The internal angle (3(F, G) is the fraction of the volume of the unit ball covered by the cone obtained 
by observing the face G from any any point in face F. The quantity f3(F, G) is defined to be zero 
if F is not contained in G and is defined to be one if F = G. 

• The external angle r y(G,'P) is defined to be the fraction of the volume of the unit ball covered by 
the cone formed by the outward normals to the hyperplanes supporting V at the face G. If G = V 
then 7(G,'P) is defined to be one. 
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In this section we describe a method to obtain upper bounds for P(E\x G F) by finding upper bounds 
on the internal and external angles described above. We first analyze P(E\x <E F) for the "simplest" 
class of faces. We denote by Fq , the face whose vertices are given by g^-ei, ^£2, ■ ■ ■ , ^Z e k- Thus the 
vertices are defined by the first k indices and we call this face the "leading" k — 1-dimensional face of 
V. We will spend much of this section developing bounds for P(_E|x e Fq) for such "leading" faces. 
Then in Section IIV1 we describe the typical set of our signal model and show that for the purposes of 
bounding P(-E) it is sufficient to consider a certain special class of faces. The bounds we develop in this 
section for "leading" faces can be easily generalized to faces belonging to this special class. 

A. Angle Exponents for leading faces 

We define the family of leading faces T\ as the set of all faces whose vertices are given by ^ e i> ^ e 2, ■ ■ ■ , ^-e/c 
for some k. In this section we will establish the following result which bounds the internal and external 
angles related to a leading face of V . We will then use this result in the next section to provide an upper 
bound on P(£|x G F) for F G JF\. 

Theorem 2. Let ^ = 5 and — — r with r > 8. Let G l be the face whose vertices are given by 
^-ei, ^e 2 , . . . , and letF^ C G\ be the face whose vertices are given by ^ei, ^e 2 , . .., ^e fe . 
Then there exist quantities i\) e xt(j) and il>int{8,T) which can be computed explicitly as described in 
Section \III-C2\ and Section \III-C3 \ respectively, such that for any e > 0, there exist integers no(e) and 
ni(e) satisfying 

. n- 1 log(/3(F fe ,G[ ) )) <jj in t(S,T) + e, for all n > n {e), 

. n^ 1 \og(j(G l () ,V)) < tpextir) + e, for all n > m(e). 
The quantities ipint(8,T) and ip e xt{f) are called the optimized internal angle exponent and the external 
angle exponent respectively. 

Whenever any of the angle exponents described above is negative, the corresponding angle decays at an 
exponential rate with respect to the ambient dimension n. We will prove Theorem |2] over the following two 
subsections by finding the optimized internal and external angle exponents that satisfy the requirements 
of the theorem. 

1 ) Internal Angle Exponent: In this subsection we find the optimized internal angle exponent ^ nt (6, r) 
that satisfies the conditions described in Theorem |2] We begin by stating the following result from ifTO) 
which provides the expression for the internal angle of a face F with respect to a face G in terms of the 
weights associated with their vertices. Subsequently, we find an asymptotic upper bound on the exponent 
of the internal angle using that expression. In what follows, we denote by HN(0, a 2 ) the distribution of 
a half normal random variable obtained by taking the absolute value of a N(0,a 2 ) distributed random 
variable. 

Lemma 1. MOV Define crij = ^2p =i w 2 . Let Yq ~ N(0, ^) be a normal random variable and Y p ~ 
HN(0, 2aik ) f or P — 1j 2, . . . , Z — fc fee independent half normal distributed random variables that are 
independent of Yq. Define the random variable Z = Yq — Y^plX ^P- Then, 

« F »- G » ) = ^v / S K<0K 

where pz(-) is the density function of Z. 



7 



We now proceed to derive an upper bound for the quantity Pz(Q)- Much of the analysis is along the 
lines of Q. Let the random variable S be defined as S = J2p=i Yp- So, Z = Yq — S and using the 
convolution integral, the density function of Z can be written as 

/OO />oo 
PY {-v)p s {v)dv = 2 / vp Yo (v)F s (v)dv, 
-oo JO 

where Fs («) is the cumulative distribution function of S. Let fis be the mean of the random variable S. 
Then, 

rfJ-s poo 

p z (0)=2 vp Yo {v)F s {v)dv + 2 vp Yo (v)F s (v)dv . 

Jo Ju s 

1 II 

As in J2), the second term satisfies II < J°° 2vpY (v)dv = e^^. As we will see later in this section 

\x s ~ cn for some c > and hence 7J ~ e c " . Since we are interested in computing the asymptotic 
exponent of pz(0), we can ignore 7/ from this computation. To bound I, we use Fs(v) < exp(—X* s (v)), 
where X* s (v) denotes the rate function (convex conjugate of the characteristic function Ag(.)) of the 
random variable S. So we get 

2 r^ s 2 

I <— ve~ v -*sl*)dv. (3) 
V 71 " Jo 

For ease of notation we define the following quantities. 

sk+1,1 — w p 

p=k+l 

A( S )^^ 2 +log(2$( s )) 
A o(?/) - maxsy - A (s). 

s 

Here A(s) is the characteristic function of the standard half normal random variable and is the 

standard normal distribution function. Using the above definition, we can express the relation between 

X* s (y) and X* Q (y) as 

We compute 

V— ,». l~2 Sk+l,, 
EYn=\/~- 



p=l 

Changing variables in ([3]i by substituting w = ^±=^y, we get 



7T a/C l,fc 



^<^-^ / yexp[-- y - s fc+ i, ; A (y)Jay- 

V 71 " Jo ^0"l,fc 



8 



Now, as in, = / (^), we have 



E /(£) =n(£ f(x)dx + o(l^j 



Sk+l.l — 

i=fe+l 

Define Cq(5, r) = J. f(x)dx. This gives us s/j +1 ; = n(co(<5, r) + o(l)). Similarly, 

*m = E ™ 2 = E ^ 2 (-) = < f f 2 w dx + °«) = n ( c ^ + 

4=1 i=l W ^ 

where ci(<5) is defined as C\ (S) = r o " / 2 (a;)dx. This gives us 



.s 2 



' ' y 2 + Sk+ iA*o(y) = n ^y 2 + c X*(y) + o(l) ) = n( V (y) + (1)). 



2fTfc+i,i " \2ci 

where rj{y) is defined as r)(y) = (^^y 2 + c oA*)(?/)^. Using Laplace's method to bound I as in [2|, we 
get 

I < R n e- nr *y*\ (4) 
where n _1 log(i? n ) = o(l), and y* is the minimizer of the convex function rj(y). Let 

A*,(y*) = max s sy* - A (s) = s*y* - A (s*). 

Then the maximizing s* satisfies A (s*) = y*. From convex duality, we have Aq (y*) = s*. The 
minimizing y* of i](y) satisfies 

c Iy*+c X* o (y*) = 



This gives 



-V + c s* = 0. (5) 
ci 



Ao(0 = --«*■ (6) 
co 



First we approximate Ao(s) as follows 

o( S ) = — e A K+fe s ) = — E x (f (-) s ) + °( : ) = - / A ( s ^)) da; + °w- 



J-fc 

I 

Al, 

p— 1 

From the above we obtain 



ds c J 5 

Combining this with equation ©, we can determine s* up to a o(l) error by finding the solution to the 
equation 

r T 

f(x)X (s*f{x))dx + cis* = 0. (7) 
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We define the internal angle exponent as 

^„ t (/3, t, y) 4 _( T - 5) log 2 - 77(2/) (8) 

= -(r-<S)log2- (|V + c oAS(y)) , (9) 

and the optimized internal angle exponent as 

^int{P,r) =maxip int (0,T,y) = -(r - <5)log2 - r}(y*) (10) 
y 

= -( T -5)log2-(jty* 2 + c \*(y*)) , (ID 

where y* is determined through s* from equation <(5j and s* is determined by solving the equation (Q. 
From inequality (0]l and Lemma Q] we see that the function ^i n t(/3,r) satisfies the conditions described 
in the first part of Theorem [2] 

2) External Angle Exponent: In this subsection we find the optimized external angle exponent ^^(t) 
that satisfies the statement of Theorem |2] We proceed similar to the previous section and begin by stating 
the following lemma from iflOl which provides the expression for the external angle of a face G with 
respect to the polytope V in terms of the weights associated with its vertices. We then find an upper 
bound on the asymptotic exponent of the external angle. 

Lemma 2. MOV The external angle j(G ,V) is given by 



To simplify the expression for the external angle, we define the standard error function as 

2 f x 2 
erf (a) = — = / e~ t2 dt 

v 71 " Jo 



I poo n 

{G l ,V) = s j^- ^ e-^ 2 J] erf (w iX )da 



and rewrite the external angle as 

7 

v J " i=l+l 

Similar to the method used in the previous section 

(t u =^to?=n^ f 2 (x)dx + o(l)^j =nc 2 (r) +o(l). 
where Oz{t) is defined as C2(r) = L r f 2 {x)dx. Substituting this we have 







c 2 x 2 ^ log(erf(t0ia)) j 



exp[— n^(x)]dx. 
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where ((x) is defined as ((x) = (c 2 x 2 — ~ Y^i=i+i log(erf (wix))) . Again using Laplace's method we 
get 

y(G l ,V)<Rnexp[-nC{x*)], (12) 

where x* is the minimizer of £(x) and -nT 1 log(i?„) = o(l). The minimizing x* satisfies 2c2X* = G (x*), 
where 



1 " 

G (x) = - V" log(erf(iOja;)). 
i=/+l 

We first approximate Go(x) as follows: 

Go ^ = n E log(erf(^)) = - £ log(erf(/(i/n)aO) = / log(err(x/(t/)))dy + o(l). 

So the minimizing x* can be computed up to an error of o(l) by solving the equation 

, f 1 med\x(f(y)) J 
2c 2 x = / dy. (13) 

We define the external angle exponent as 

$ext(T,x)±-t(x) (14) 

and the optimized external angle exponent as 

ipext(r) = max^ext(T) = — C(a?*) = - ^c 2 x* 2 - J log(erf(a;* j '(y)))dy\ . (15) 

where x* can be obtained by solving equation ( fT3l ). 

From (fTST t it is clear that this function satisfies the conditions in the second part of Theorem [2] This 
completes the proof of both parts of the theorem. 



B. Recovery Threshold for the family of leading faces 

In this section we use the bounds on the asymptotic exponents of the internal and external angle from 
Theorem |2] to find an upper bound for P(E\x E F) for F € T\. The main result of this section is the 
following theorem. 

Theorem 3. Let F = Fq for k = Sn. Fg G T\ be the face with k — Sn. There exists a function "tptoti^) 
which we call the total exponent of Fq such that given e > 0, there exists n(e) such that for all n > n(e), 
— log(P(_E|x G Fq)) < —wiptotiS) + e. The function 'tptotiS) can be computed explicitly as described in 

m. 

Using the decomposition equation (0 and the fact that f3(F, G) is non-zero only if F C G, we get 
P(£|x G F) < 2 E (3(F,Gh(G,V). 

(;>m) (GDF, G£J(1)) 

Recall that J(r) is the set of all r-dimensional faces of V. 

To proceed, we will need the following useful lemma (proof can be found in the appendix). 
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Lemma 3. Among all (I — \)-dimensional faces G of V satisfying Fq C G, the face that maximizes 
(3(Fq, G) and j(G,"P) is the one with jpej., ^^2, ■ ■ ■ , as ' ts vertices. 

Using Lemma [3] 

P(F|xeF fc )< J2 tfZt)*~ k ft F o> G »WGl'P) 



l=m+l 



<{n-m) max f" fW"* (3(F Q k ,G ) 7 (G ,F). 

{l:l>m} \l — K / 

Using t = — , 5 = — and Stirling's approximation for factorials, it can be shown that 

where H(x) is the entropy function with base e defined by H(x) = — xlog(x) — (1 — x)log(l — x). 
Define the combinatorial exponent ip C om {S, r) as 



Vw(<5, t) 4 (1 - f + (r - 5) 



log 2. 



So, we conclude, 



— log(P(F|x € F)) < max Vw(<5, r) + fe(iS, r) + VestM + o(l). 

n {7:7>«} 



The function iptot(8) defined by 
satisfies the condition given in Theorem [3] and hence establishes the theorem. 



tptot{5) = max i[> com (8, r) + ip int (5, r) + ip ext {r) 

{r :7>a} 



C. Obtaining tighter exponents 

In Section Ull-Bl we developed a method that provides an upper bound on P(F|x g Fq ) in terms of 
internal and external angle exponents. For a given value of I, we bounded (3(Fq,G) and j(G,V) for 
each G € J(Z) by /3(F fc , G )7(G , F) by using Lemma [3] This bounding step is generally rather overly 
conservative because it does not take into account the variation of the function /(.) over the complete 
interval [0, 1]. The quality of bound is especially poor for choices of function /(.) which are rapidly 
increasing. To improve on the bound, we can use a simple technique which allows us to take into account 
the variation of f(u) with respect to u more accurately over the whole interval. 

Divide the set of indices k + 1, . . . n into two parts with Ti = {k + 1, . . . , and T 2 = + 

1, . . . , n}. For a particular /, let G have 1% vertices in T\ and I2 vertices in T2. Using Lemma[3] among all 
faces G with the values of li and 1% specified as above, the choice that maximizes /3(Fq , G) and j(G, V) 
is the face G with vertices given by the indices + 1, . . . , + l 2 - Using this we get a 

revised upper bound on P(F|x € Fq) as 

" / n-k \ / n-k \ 

P(F|xeF fc )< E ; 2 , 2 « fc > G h(G,F) 

l=m+l h+l 2 =l V 1 / V 2 / 

( n—k \ / n—k \ 
2 2 /3(F fc ,G 2 ) 7 (G 2 ,F), 
'1 / \ '2 / 
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Define 71 = — and 72 = — . Then the revised bound on the exponent of P(£'|x 6 Fq) can be obtained 

as 

- log(P(^|x e Fj) < ma >X ( i^) + l -^H (*»- 

n 7!+7 2 >q-i5 2 \1 — J 2 \1 — 

+ 1pint(S, 717 72) + 1pext(jl,72) + o(l) 

= max Vcom(<5, 71) 72) + tpinttf, 71 1 72) + i>ext(ll, 12) + o(l). 
7l+72>a — <5 

where we define ip C om{5, 71,72) — (j^j + (f^l) ca ^ ^ me combinatorial exponent 

The expressions for 4>int(S, 71, 72) and ip e xt(ji, 72) are can be obtained by using the methods of the 
previous subsection (we will give the exact expressions shortly). The function 

%[} tot {8)= max V , com(<5,7l I 72) + V'mtl^, 71)72 )+ "0e2:t (71)72)- 
7l +72 — (5 

now satisfies the role of the total exponent in Theorem [3] 

We can repeat the above argument for any r by dividing the indices denoted by k + 1, . . . , n into r 
parts. Define the quantities 7$, i = 1, . . . r as 74 = ^ where is the number of indices of face G in the 
i th interval. Also define /i, = r7^ and let h = [hi, . . . , ft. r )*. The total exponent is now given by 

V'tot = max V 

h 

1 r 

subject to - ft,, > a — £, 
r i=i 

where 

^j„ t (h) = max ip int (h,y), analogous to equation © and 

v 

tpexttb.) — max ip ext (h,x), analogous to equation ( fT4l i . 

a: 

The dependence of the exponent functions on other variables has been suppressed for compactness. 
So, the total exponent can be obtained by the following maximization: 

iptot = max Vcom (h) + ipi„t (h, y) + ip ex t (h, x) (16) 

h , x , y 



1 r 

subject to - hi > a — 5. 



r 
t=i 



We now compute the expressions for each of the functions appearing in the above maximization. For 
the subsequent derivation, we define fi — f [S + ^~ 1 ^ 1 ~ <5 - > 



1 ) Combinatorial Exponent: The combinatorial exponent is given by 



^cora 

T , • 

i=l \i=l 



r 

t=i 



13 



For the internal and external angle exponents, in addition to obtaining their expressions, we will also 
bound them suitably by analytically simpler expressions so that the optimization problem described in 
( TT6T > becomes more tractable. 

2 ) External Angle Exponent: We start with 

C(x) = c 2 x 2 - G (x), 

where 

c 2 = / f 2 (u)du + V / ' " f(u)du 

Go(x)= / log(eii(xf(u)))du-J2 / " log(erf(a:/(u)))d«. 

i— 1 ^ r 

As /(.) is an increasing function, the integral appearing above in the expression of c 2 can be bound by 
its left Riemann sum. 

<•$ 



Similary, 



C2> [ f 2 (u)du+-y2ffh l ^ c - 2 . 

J o r j=l 

G (x) < f \og{eri{xf{u)))du - - ^ log(erf (a;/,-)) = G (x). 



So, 

C(x) > c 2 x 2 - G (x) 4 {( x ). 
Combining we obtain a simplified expression for the external angle exponent as 

1pext(x) = ~C(x) = - (c 2 X 2 - G (x)) . 

The optimized external angle exponent is then given by 

i) ext = maxip ext (x). 



3 ) Internal Angle Exponent: We start with 



v(y) = ^- i y 2 + c x* (y) 

where 

r r s [ i K 

i=l" , + J r " ' ,-l 

Xq(u) is the convex conjugate of Ao(-) and is given by, 

K(v) = max S V ~ Xq(s). 



^ L it=Xil=2 fiu)dU ~ r £ /A " C "°- 

„- i ^ oH „• i 
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We are interested in only in the region s < 0. In this region we have, 

r ,5 + (.-i)(i-*) + iH r 

V s ) = - E / " ' A(*/(«))dti < — V ^ A (s), 

which follows from the fact that s < and A(u) is an increasing function of u. This gives 

max sy — Aq(s) > max sy — Ao(s), 

s s 

and hence 

A (y) > Xo(y) = max sy - A (s). 

So, we conclude 

V(y)>^-/ + ^K(y)=v(y)- 
From the above we obtain a simplified expression for the internal angle exponent as 

r r / -2 

4>M = -(r£>-*)log2- fj(y) = -(r£> - 5) log2 - f^y 2 + c \*(y) 

The optimized internal angle exponent is then obtained as 

ipint = maxV>mt(y)- 

V 

To show that the maximization with respect to h in dT6t can be performed efficiently, we will show 
that ipt t(h, x, y) is a concave function of h for fixed value of x and y. This follows from the following 
lemma. The proof can be found in the appendix. 

Lemma 4. The combinatorial, internal and external angle exponents are concave functions of 'h for fixed 
values of x and y. 

The parameter r which governs the number of evaluation points we use to compute the angle exponents 
gives us a method to control the accuracy of the computed exponents. If we can obtain a value 8 such 
that for all 5 < 6 we have ^> t ot {$) < 0, then Theorem [3] guarantees that whenever S < S, weighted l\- 
minimization recovers the corresponding sparse signal with overwhelming probability. We call this 5 the 
guaranteed bound on recoverable sparsity levels <5. Increasing r results in a tighter guaranteed bound but at 
the expense of an increased cost in performing the optimization in ( TToT l. In FigureQ]we show via simulation 
how this bound improves with the parameter r. For this, we fix a compression ratio a = — = 0.5. The 
weights Wi are chosen as Wi = 1 + - which corresponds to the weight function f(u) = 1 + u. Note how 
the bound improves with increasing value of r but saturates quickly. This indicates that a moderately large 
value of r (e.g. r = 30 from the figure) will suffice for accurate enough angle exponent computations. 

IV. Proof of TheoremQ] 

In this section, we establish the main result of this paper which is TheoremQ] We do this in two steps. 
We first characterize the properties of a typical signal drawn from our model and provide a large deviation 
result on the probability that a signal drawn from our model does not satisfy this property. For signals 



15 



0.25 



0.24 - 



0.23 



0.22 



2 0.21 




0.19 



100 



Fig. 1: Guaranteed bound on recoverable S vs r for compression ratio a — 0.5 and weight function 
f(u) = 1 + u computed using the methods of Section [Till As r increases the computed bound also 
increases indicating an improvement in the tightness of the bound. Also since the improvement saturates 
fairly fast, we can use a moderately large value of r to obtain accurate enough angle exponents. The value 
r = 30 in the figure represents such a choice. 



which have this typical property, we then use the analysis method of Section |Tll] to find the exponent 
iptotip, f) m the statement of Theorem Q] We start by writing 

■p(E) =^p(xe F)P(E\x e F). 

F6JF 

To analyze this expression we will further split the sum into faces belonging to different "classes" which 
we describe below. Divide the set of indices from 1, . . . , n into r equal parts with /; denoting the i th 
interval of indices. Any face F of the skewed cross polytope V can be fully specified by the index of its 
vertices (up to the signs of the vertices). For a given face F, let the set of indices representing the face 
F be denoted by 1(F). For a given k = (fci, . . . , k r ) denote by J>(k), the set of all faces of V with 
\I(F) fl Ij| = ki- Also let gi = ki~. Recall that we denote by E the failure event, i.e. the event that the 
weighted l\ -minimization does not produce the correct solution. Then we have 

P(E) = J2 P(xeF)P(£|xeF). (17) 

k Fe^(k) 
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We can just consider one representative among the ki faces created by the different sign patterns. 
This is because, by the symmetry of the problem, all the faces have the same probabilities P(x G F) and 
P(_E|x G F). For simplicity we always choose this representative as the face that lies in the first orthant. 

\i=ijei{F)nU FKJI 'J »=i 
The function is an increasing function of x for x G (0, 1). So, 

I i-l 

P(> 




Denote the right hand side of the above inequality be P(k), which means for x G J>(k) we have 
P(x G F) < P(k). Define 

P(x G J>(k)) = P(x G Ffor some F G J>(k)). 

Then 

P(xeJ r (k))< ^ P(xeF) 

FeJF r (k) 




P(k). 

By Lemma [3] among all faces F G J>(k), the one which maximizes P(_E|x G F) is the one obtained 
by stacking all the indices to the right of each interval. We denote this maximum probability by P(E\k). 
Combining the above we get 

P(*0<£p(xeJ>(k))P(£|k). (18) 

k 

As the function p(u) is monotonically decreasing, P(x G J>(k)) is not same for all k. We now proceed 
to show that P(x G J>(k)) is significant only when x takes values in a "typical" set. For values of k 
outside the typical set P(x G J>(k)) is exponentially small in n and can be ignored in the sum in (list . 
For k in the typical set, we will use the methods of Section |TTT] to bound P(£'|k) and hence P(E). 

The following Lemma provides bounds for P(x G J>(k)) which will motivate the definition of 
typicality that follows. 

Lemma 5. Let D{q\\p) denote the Kullback-Leibler distance between two Bernoulli random variables 

i+l 

with probability of success given by q and p respectively. Define pi = r J \ r p(u)du. If we have 
i X)i=i D(gi\\pi) > e, then there exists no, such that for all n > no, 

P(x G T r (k)) < e~ b ^ n , 

where 6(e) is a positive constant. 
Proof. We have 

p(*€Jv(k))=(n (L)) p M- 
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Then, 



1 1 . r / — \ 1 

- log (P(« e *.(k))) = -EUL + n l0g(P(k)) 



i=l v 7 j — 1 



Letting n — » oo, 

lim i log(P(x e J>(k))) 

n— >-oo 77, 



; E F ^) + J E 9i log (^-) + [ iog(i - P («))d« 

i E + logfe) + (1 - ft) log(l - Pi)) - log(l - p (^rj ) + J q M 1 - P{u))du 
-j2-D( qi \\ Pl ) + A(r), 



r . 



where A(r) is the error in the Riemann sum given by J Q log(l — p(u))du — i l°g(l — P ( :L F L )) 
Now further divide each of the r intervals into t equal parts thus forming a total of tr intervals. Let gt,i- 
denote the number of indices of face F in the (tr) th interval counted according to the new partition. 
Summing up the number of indices in all the intervals of length ^ contained in an interval of length - 
gives g% — \ J2*j=ti+i 9t,j- Also let p r j = p (^). The calculations just carried out above can be repeated 
for this new partition to give 



r t 

lim - log(P(x e 7V(k))) < ~yS^D{g tM+j \\p tM+j ) + A{tr). 



i=l 3 = 1 



We now bound the term X)i=i D (9t,ti+j\\Pt,ti+j)- 



^ j D{g tM+ j\\p tM+j ) = Y j g titi+j log (-h±!+l\ + V(l -&,«+_,•) log (- — ihll+A 
~[ ~[ \Pt,ti+iJ ~{ \t-Pt,u+3/ 

Using the log-sum inequality, we obtain, 
1 * 

■7^2 D (9t,u+]\\Pt,ti+ 3 ) 

= 1 

. / J2j=i 9t,ti+j \ ( J2j=i 9t,ti+j \ / Z)j=i 1 - 9t,ti+j \ /E^i 1 " 9t,ti+j 

- {-^) log Ieu^j j + 1 — ; — j log Ul.i-p, W / 

9i \ , n m ( l ~ 9i 



= 9i \og(^j +(l- 5i )Iog( 

=£>(ftllpM 



1 - Pi 



IS 



where, p i)t = \ Y?j=iPt,ti+j- Using this, 

lim - log(P(x G Jy(k))) < - - Y).D(ff<||ft,t) + A{tr). 

i=l 

Since this is true for every t, we let t — > oo to get 

lim -log(P(x G J>(k))) < -if^DCffillft). 



r 
j=l 



So, if the condition of the lemma is satisfied, then 



lim -log(P(x G J>(k))) < --. 
and the claim in the lemma then follows. □ 

Lemma [5] motivates the following natural definition of epsilon typicality. 
Definition 1. Given e > 0, define k = (k%, . . ., k r ) to be e-typical if - X)[=i ^(diWPi) — e - 

Using this definition of e-typicality, we can now bound the probability of failure P(E) by using ( fT8l > 



as 



P(E)< J2 P(xG J- r (k))P(£;|k)+ ^ P(x G J>(k))P(£?|k) 

\t is e typical knot e typical 



I II 

r-1 



For a fixed value of r, the number of possible values of ki, k2, ■ ■ ■ , k r is bounded by n(n + r — 1 
Since P(£'|k) < 1, the second term 77 can be bounded as 

II < n{n + r-l) r - l e-<^ n , 

for some positive constant c(e). As lim n ^oo i log (n(n + r — l) r_1 ) = 0, there exists no, such that for 
all n > tiq, 



II < e 



-c (e)n 



for some positive constant Co(e). This allows us to only consider the first term I for k which are e-typical. 
Among all k e-typical, let F* p be the face which maximizes the probability P(_E|x G F), Then, 

I < P(£|x G Fl p ). 

This gives 

lim -log(7)< lim -log(P(£J|x G F* )). 

n— >-oo 77, n— ^oo 71 ,y 

Also, since we can choose e to be as small as we want, we can essentially just consider the case with 
e = 0. This gives us k = (k%, . . . ,k r ) with fcj = pi. This defines a fixed face F p which we call the 
typical face of V and we only need to bound the term P(_E|x G F p ) which can be done by a simple 
generalization of the methods in Section [III] The corresponding expressions for the optimized internal 
and external angle exponents can be found in the Appendix. 
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V. Numerical computations and simulations 

In this section, we compute the bounds using the techniques developed in this paper for a specific 
probability function p{.) and weight function /(.). In particular we choose p(.) to be a linear function 
whose slope is governed by a parameter c. 

p(u) = 5 - c(u - 0.5), we [0,1]. (19) 

The expected level of sparsity of signals drawn from this model is L p(u)du — S. The value of c governs 
how tilted the signal model is. Larger values of c results in a higher tilt which means a random signal 
drawn from this model will have most of its non-zero entries concentrated at the beginning. To recover 
signals drawn from this model we use weighted l\ -minimization with choice of weights defined by the 
weight function 

f(u) = l + pu, we [0,1]. (20) 

The value of the constant p controls the variation of the weights used. As a general guideline, we need 
to choose higher values of p for higher values of c. Later in this section we describe a method to make 
this choice based on the theoretical bounds computed in this paper. 

Before proceeding to evaluation of the performance of weighted ^-minimization in the problem 
specified by the above choice of functions, we will first present theoretical bounds and simulations related 
to the behavior of the so-called family of leading faces F\, when the corresponding cross-poly tope is 
described by the function /(.). This is contained in Section IV-AI The performance evaluation of weighted 
^i-minimization for the model described above is delayed till Section IV-BI 

A. Behavior of the leading family of faces J-\ under random projection. 

Recall that, for a given set of weights, the weighted ^i-ball is the cross polytope in n dimensions whose 
vertices in the first orthant are given by . . . , for some k. We call this face Fq . In Section ITiI-Bl 
we developed sufficient conditions when weighted l\ -minimization recovers a signal x whose support is 
the first k indices with overwhelming probability whenever k — 5n. We define the quantity 

S = max S 

subject to iptot(S) < 0, 

where iptot{8) is the total exponent for the face Fq with k — Sn as defined in dl6V From the above 
definition and Theorem [T] it can be concluded that whenever 5 < 5, we are guaranteed to be able to 
recover the corresponding sparse signal with overwhelming probability. We call the quantity <5 to be the 
guaranteed bound on recoverable sparsity levels 5. In view of our choice of weights, which is described 
by the function f(u) = 1 + pu with p > 0, higher values of p correspond to more steeply varying weights. 
Intuitively, one may expect that higher values of p, will make the weighted £% norm cost function favor 
non-zero entries in the first few indices which may allow the threshold 5 to be larger. 

We will show that the quantity 5 follows an increasing trend as described above. To demonstrate this 
for a certain choice of the parameters, we fix the compression ratio — = 0.5 and compute the bound 
using the methods developed in Section IIII-BI Based on Figure [T] we choose r — 30 as a reasonable 
value for the accuracy parameter in our computations. Figure [2] shows the dependence of this threshold 
on the value of p which governs the weight function /(.). The value of the bound at p = corresponds 
to the case when Fq u is a face of the regular i\ ball. This is the threshold for 5 below which standard 
t\ -minimization succeeds in recovering a signal with sparsity level 6 with overwhelming probability. As 
expected this value matches the value reported earlier in [Q. 
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Fig. 2: Guaranteed bound on recoverable 5 vs p for the "leading face" Fq 71 , computed using the methods 
of Section ITlFBl for r = 30 



To evaluate the accuracy of the bound, we then compare the values of the threshold predicted by the 
guaranteed bound to that obtained empirically through simulations for two different values of the parameter 
p. For this, we set m = 200, n = 400, and obtain the fraction of times x <E Fq™ failed to be recovered 
correctly via weighted l\ -minimization from a total of 500 iterations. This is done by randomly generating 
a vector x for each iteration with support 1, . . . , k and using weighted i\ -minimization to recover that x 
from its measurements given by y — Ax. Figure [3a] and Figure [3b] show this plot for p — and p = 1 
respectively. The vertical lines in the plots (marked A and B respectively) denote the guaranteed bounds 
corresponding to the value of p in Figure [2] The simulations show a rapid fall in the empirical value of 
P(E\x £ Fq 71 ) around the theoretical guaranteed bound as we decrease the value of <5. This indicates that 
the guaranteed bounds developed are fairly tight. 

B. Performance of weighted t\-minimization. 

In this subsection, we compute the theoretical bound on recoverable sparsity levels using the methods of 
this paper. We use the probability function and weight function in (fT9l > and (l20l i respectively. The choice 
of p plays an important role in the success of weighted t\ -minimization and it would be of interest to be 
able to obtain the value of p for which one gets best performance. One way to estimate the effect of p is 
to compute the guaranteed bound 8 as suggested in Section [TV] and observe the trend. We can then pick 
the value of p which maximizes 5. 

To demonstrate this via computations we fix the ratio a — ^ = 0.5 and compute the guaranteed bound 
on recoverable 5 (which denotes the expected fraction of non-zero components of the signal) using the 
methods developed in Section [IV] The accuracy parameter r is fixed at 60. Figure [4] shows the dependence 
of 5 on the values of p for three different values of c. The curves suggest that for larger values of c, 
which correspond to more rapidly decaying probabilities, the value of p = p* (c) which maximizes 5 is 
also higher. At the same time, the value of 5 evaluated at p = p* (c) also increases with increasing c. This 
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(a) p = (b) p = 1 

Fig. 3: Empirical probability of error P(.E|x £ F§ n ) vs S for p = and p = 1 respectively obtained 
through simulations using m = 200, n = 400 over 100 experiments. The vertical line in each figure refers 
to the guaranteed bound on recoverable S for the corresponding p computed using methods of Section 

ELS 

TABLE I: c vs p* (c) using theoretical guaranteed bounds with r = 60 (Figure [4]i 



c 





0.16 


0.26 


0.36 


p 





0.6 


1.0 


1.8 



suggests that rapidly decaying probabilities allow us to recover less sparse signals by using an appropriate 
weighted l\ -minimization. 

To provide evidence that weighted l\ -minimization indeed improves performance, we conduct simu- 
lations to compare standard and weighted ^-minimization. We fix the value of 5 to be 0.185 which is 
known to be greater than the recoverable threshold for standard l\ -minimization. We then explore the 
effect of choosing different values of the model parameter c in (fT~9T > on recoverability. We sample random 
signals with supports generated by the distribution imposed by p(.). We then utilize the curves computed 
in Figured] to make the best choice of p (see Table H| and use weighted t\ -minimization corresponding to 
this p to recover the generated signal from its measurements. We compute the fraction of the experiments 
for which this method fails to recover the correct signal over 500 experiments. The values of m and n 
are chosen to be 500 and 1000 respectively. To compare the performance of weighted ^-minimization 
to standard ^-minimization, we repeat the same procedure but use standard l\ -minimization to recover 
the signal. Figure [5] compares the values generated by each method. Notice how the performance of 
the standard l\ -minimization method remains more or less invariant with increasing c. This shows that 
standard l\ -minimization fails to exploit the extra information present because of the knowledge of c (i.e. 
the decaying nature of the probabilities) and its performance depends only on the value of i5, the expected 
fractional level of sparsity and is insensitive to the tilt of the model given by c. On the other hand, the 
performance of weighted £i -minimization improves with c. 
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Fig. 4: Guaranteed bound on recoverable S vs p for ^ = 0.5 computed using the methods of this paper, 
for c = 0.16, c = 0.26 and c = 0.36. The parameter r is fixed at 60. 



VI. Conclusion 

In this paper we analyzed sparse signal recovery via weighted l\ -minimization for a special class of 
probabilistic signal model, namely when the weights are uniform samples of a continuous function. We 
leveraged the techniques developed in J2] and iflOl to provide sufficient conditions under which weighted 
^i-minimization succeeds in recovering the sparse signal with overwhelming probability. In the process, 
we also provided conditions under which certain special class of faces of the skewed cross-polytope get 
"swallowed" under random projections. 

A question central to the weighted ^-minimization based approach is the choice of optimal weights. 
The authors in [ 1 1 1 were able to answer this question for the simple class of models they considered. 
In this paper, we were able to do the same provided that we restrict our search to a certain family of 
weight functions. An interesting question to pursue in future work would be to characterize this family 
of functions in terms of the function p(u) that specifies the prior probabilities. 



Appendix A 
proof of Lemmas [3] and @] 

A. Proof of Lemma \3\ 

Let Gn be the face whose vertices are given by — ei, .... — et, — 
other than Go whose vertices are given by — e±, . . 



, iej. Let G be any face 
—e.,,. Consider forming 

a sequence of faces G°, G 1 , . . . , G l ~ k , where G° = G, and G 4+1 is obtained from G % by swapping the 



23 



0.5 




1 1 , , , , , , , 1 

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 

c 



Fig. 5: Empirical probability of error P(E) of weighted l\ -minimization vs the tilt of the model given 
by the parameter c. Probability function p(u) = 0.185 — c(u — 0.5) and weight function f(u) = 1 + p*(c) 
where p*(c) is the optimal value of p obtained form Figure |4] Problem size is given by m = 500, 
n = 1000. Number of experiments = 500. 

vertices e n ,_ j+1 and — - — e;_j+i. Since w ni _ i+1 > the expression in Lemma [2] for the 

external angle increases at each step. Hence, y(Go,P) > 7(G, P). 

For a fixed value of I, the exponent for the internal angle is only affected by the term pz(0) in the 
expression for internal angle exponent in Lemma Q] Also 

p z (0)=2 vp Yo (v)F s (v)dv. 
Jo 

Following the same procedure as above for generating the sequence of faces G\ it can be seen that at 
each step the variance of some Y p is decreased while keeping the other F^s unchanged. Thus Fs(v) in 
the above expression for pz(0) increases at each step. Thus, the exponent for /3(Fq,Gq) is greater than 
or equal to that of /3(Fq, G). 

B. Proof of Lemma @ 

We rewrite the expression for the combinatorial exponent from Section IIII-C1I 

Vw(h) = — £ H (-^-\ + rhi - S) log 2. 

i=l ^ ' i=l 

The concavity of this function follows from the fact that the standard entropy function H(.) is a concave 
function. From its expression in Section IIII-C2I we observe that the external angle exponent is a linear 
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function of h and hence a concave function. The concavity of the internal angle exponent is slightly more 
involved and we spend the rest of this section in proving it. 

The internal angle exponent as computed in Section IIII-C3I is given by 

^ mt (h) = - rhi - S) log 2 - ( ^-y 2 + c X* (y) 



i=l 



The quantity Co = co(h) = /i'&i i s a linear function of h. So c ° 2ei ?/ 2 is a convex quadratic function 
of h. Therefore it suffices to prove that F(h) = co(h)Ao(y) is convex in h. Recall that 

1 r 

K(v) = K(v> h ) = maxsy — — V hi\(sfi). 

rc (h) ^ 

Therefore 



1 r 

F(h) = maxc (h)sy V" hiX(sf, 

c r I. / 



r 

i=l 



Since the argument in the above maximization is linear in h it follows that F(h) is convex in h. 

Appendix B 
Angle exponents for the typical face F p 

Divide the interval [0, 1] into r equally spaced intervals. Let the face F in consideration have n<5,; indices 
in the i th interval. Also, let gi — rSi. The asymptotic exponents for this face can be obtained easily by a 
straight-forward generalization of the procedure described in Section UlI-AI We give the final expressions 
for the combinatorial, internal and external angle exponents for a given value of g — (51,32, • ■ • ,9r) T - 
In what follows, we use fi = f (^). 

1 ) Combinatorial Exponent: 

1 r ( h \ r 

Vw(h) = - ^2(l-g l )H f — - J +r^2(hi- gi), 

r i=l ^ ® l ' i=l 



where H(.) is the binary entropy function with base e. 

2) Internal Angle Exponent: The negative of the internal angle exponent is given by 



ip int (h,y) = -r^2(hi - gi) - ( -^-y 2 + coX^y) j , 



r 2 



where 



co = ~ ^3 fjhj-, 

i—1 

1 r 



r 

i=l 



= maxsy - A (s), 



Here A(u) = \ + log(2$(u)) is the characteristic function of the standard half-normal distribution. 
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3) External Angle Exponent: The negative of the external angle exponent is given by 

ipextQ*, x) = - (c 2 x 2 - log(G (x))) , 

where, 

1 r 

C2 = -z2fi(9i + hi), and 

i=l 

log(G (i)) = f \og(e,i(xf{u)))du -ifl log(erf(x/i))(/ii + 9l ). 



4) Total Exponent: Combining the exponents we define the total exponent as 

iptot = max ip com + ip int (h, y) + ip ext (h, x) 

l r 

subject to - hi>a — 8, 
r i=i 

< hi < 1 - 9i, 

where 6 = - Yli=i 9i- From Theorem [3] the total exponent satisfies 

-log(PCE|x G F)) < i> tot + o(l). 
n 

So as long as the quantity ^> t ot < 0, weighted t\ -minimization succeeds in recovering the sparse signal 
with an exponentially small probability of failure. 
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